Denumerable-Armed Bandits
نویسندگان
چکیده
منابع مشابه
Contextual Multi-Armed Bandits
We study contextual multi-armed bandit problems where the context comes from a metric space and the payoff satisfies a Lipschitz condition with respect to the metric. Abstractly, a contextual multi-armed bandit problem models a situation where, in a sequence of independent trials, an online algorithm chooses, based on a given context (side information), an action from a set of possible actions ...
متن کاملInfinitely many-armed bandits
We consider multi-armed bandit problems where the number of arms is larger than the possible number of experiments. We make a stochastic assumption on the mean-reward of a new selected arm which characterizes its probability of being a near-optimal arm. Our assumption is weaker than in previous works. We describe algorithms based on upper-confidence-bounds applied to a restricted set of randoml...
متن کاملStaged Multi-armed Bandits
In conventional multi-armed bandits (MAB) and other reinforcement learning methods, the learner sequentially chooses actions and obtains a reward (which can be possibly missing, delayed or erroneous) after each taken action. This reward is then used by the learner to improve its future decisions. However, in numerous applications, ranging from personalized patient treatment to personalized web-...
متن کاملMortal Multi-Armed Bandits
We formulate and study a new variant of the k-armed bandit problem, motivated by e-commerce applications. In our model, arms have (stochastic) lifetime after which they expire. In this setting an algorithm needs to continuously explore new arms, in contrast to the standard k-armed bandit model in which arms are available indefinitely and exploration is reduced once an optimal arm is identified ...
متن کاملAnytime many-armed bandits
This paper introduces the many-armed bandit problem (ManAB), where the number of arms is large comparatively to the relevant number of time steps. While the ManAB framework is relevant to many real-world applications, the state of the art does not offer anytime algorithms handling ManAB problems. Both theory and practice suggest that two problem categories must be distinguished; the easy catego...
متن کاملذخیره در منابع من
با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید
ژورنال
عنوان ژورنال: Econometrica
سال: 1992
ISSN: 0012-9682
DOI: 10.2307/2951539